Search CORE

14 research outputs found

What Your Username Says About You

Author: Jaech Aaron
Ostendorf Mari
Publication venue
Publication date: 01/01/2015
Field of study

Usernames are ubiquitous on the Internet, and they are often suggestive of user demographics. This work looks at the degree to which gender and language can be inferred from a username alone by making use of unsupervised morphology induction to decompose usernames into sub-units. Experimental results on the two tasks demonstrate the effectiveness of the proposed morphological features compared to a character n-gram baseline

arXiv.org e-Print Archive

CiteSeerX

Crossref

Talking to the crowd: What do people react to in online discussions?

Author: Fang Hao
Hajishirzi Hannaneh
Jaech Aaron
Ostendorf Mari
Zayats Victoria
Publication venue
Publication date: 01/01/2015
Field of study

This paper addresses the question of how language use affects community reaction to comments in online discussion forums, and the relative importance of the message vs. the messenger. A new comment ranking task is proposed based on community annotated karma in Reddit discussions, which controls for topic and timing of comments. Experimental work with discussion threads from six subreddits shows that the importance of different types of language features varies with the community of interest

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hierarchical Character-Word Models for Language Identification

Author: Hathi Shobhit
Jaech Aaron
Mulcaire George
Ostendorf Mari
Smith Noah A.
Publication venue
Publication date: 01/01/2016
Field of study

Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching

arXiv.org e-Print Archive

Crossref